COMPUTERS/COMMUNICATIONS 


The  information  age  is  burying  business  in  information. 
To  the  rescue  come  firms  like  Verity,  with  systems 
for  targeted  text  retrieval. 


aystack  searching 


By  David  Churbuck 

You  CAN  blame  George  Boole,  the 
19th-century  logician,  for  the  frustra- 
tion computer  users  suffer  trying  to 
extract  the  perfect  piece  of  informa- 
tion from  a  database  of  documents. 

Boolean  logic,  which  frames  que- 
ries with  ors  and  ands,  is  fine  if  the 
document  sought  can  be  precisely 
targeted  by  key  words,  dates  or  places 
of  publication.  But  often  it  can't  be. 

Let's  suppose  that  you  want  to 
assemble  a  folder  of  newspaper  and 
magazine  articles  addressing  George 
Bush's  reelection  prospects.  Too  re- 
strictive a  search  may  miss  important 
articles.  If  you  limit  the  retrieval  to 
items  that  mention  both  the  Presi- 
dent and  "reelection,"  for  example, 
you  will  miss  the  ones  that  talk  about 
his  popularity  and  the  coming  elec- 
tion but  don't  contain  the  word  "ree- 


lection." Too  broad  a  query  also  gives 
useless  results.  Ask  for  all  items  con- 
taining "George  Bush  OR  President 
Bush"  and  one  popular  news  retrieval 
service  responds  that  there  are 
1,866,525  selections. 

The  answer  to  this  needle-in-the- 
haystack  problem  is  intelligent  text 
retrieval,  software  that  can  combine 
the  raw  power  of  a  computer  shuffling 
through  tens  of  millions  of  docu- 
ments with  the  common  sense  that  a 
human  researcher  would  bring  to  the 
problem.  Most  text  retrieval  being 
done  today  is  still  of  the  literal-mind- 
ed, Boolean  variety.  But  as  intelligent 
searching  systems  become  more  pow- 
erful, they  could  expand  the  market 
considerably,  from  one  where  legal, 
academic  and  scientific  users  predom- 
inate to  one  where  corporations  rou- 


tinely search  published  documents  to 
find  out  more  about  their  competi- 
tors and  customers.  Analyst  Ann  Pa- 
lermo of  International  Data  Corp. 
projects  that  the  market  for  retrieval 
software  will  quadruple  by  1995,  to 
$400  million  a  year. 

Among  the  players  in  the  smart 
searching  business  are  Fulcrum,  an 
Ottawa  company  that  sells  text  re- 
trieval software  to  other  software  and 
hardware  companies,  and  Verity  Inc., 
a  Mountain  View,  Calif,  company 
that  has  a  license  to  commercialize 
some  of  Advanced  Decision  Systems 
Research's  techniques.  Verity  has  its 
roots  in  Advanced  Decision,  now  a 
subsidiary  of  management  consul- 
tants Booz,  Allen  8c  Hamilton,  which 
contracts  with  the  government. 
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p  Classic  text  searching  is 

fUZZy  vcry        m  its  B00iean  OR 

retrieval  and  and  logic.  Verity's 

Topic  software,  in  contrast, 
allows  a  researcher  to  hint 
around  about  his  topic  of 
interest  and  fetch  docu- 
ments that  merely  come 
SiS  close.  The  software  is  a 

descendant  of  the  "fuzzy 
I  :t  ~  ,r.  logic"  school  of  artificial 

intelligence. 

The  drawback  to  Topic 
is  that  it  demands  more  .  ■ .  - 
mental  effort  on  the  part 
of  the  userJ;  But,  having 
drafted  a  query  on  a  par- 
ticular subject,  the  user  can 
apply  it  again  and  again —  ; 
say,  to  each  day's  wire  ser- 
k  i-  vice  stones. 

In  the  example  below, 
a  stock  analyst  needs  a  sys- 


tem for  retrieving  items 
relating  to  corporate  take- 
overs by  AT&T.  He  com- 
poses a  query  he  calls 
"Phone-Deal."  The  que- 
ry branches  off  into  two 
subtopics.  The  first  lists 
words  suggestive  of  corpo- 
rate dealmaking;  the  sec- 
ond,  the  different  ways  . 

y  AT&T  could  be  represent- 
ed in  news  articles. 
The  results  of  this 

;  search  range  from  an  item 
about  Teradata's  second- 
quarter  results  that  de- 
scribes AT&T's  purchase 
of  that  company  to  an  item 
mentioning  AT&T  and  an 
"acquisition" — a  word 
that  turns  out  to  refer  to  a 
government  purchasing 
program. 
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0M  Teradata  Reports  Second-Quarter 
Results 

0.80  Firms  Dispute  FCC  Approval  of 
Beit/Metro  Merger 

0M  NCR  Sales  Channels  Now 
Marketing  the  StarServer 

0.80  AT&T  Reports  Earnings 

0.75  AT&T  Unit  Unveils  High-Speed 
Transmission  Technology 

0.60  BellSouth  Reports  1931  Results 

0.50  AT&T  Waives  Sign-Up  Fee  For 
Reach  Out  Ohio  Plans 

0.40  Pentagon  Extends  CHAMPUS 
Contract;  Re-Bid  Anticipated 


and  Harvard  Business  School  gradu- 
ate who  founded  Verity  in  1988,  calls 
his  firm's  approach  "conceptual 
searching."  Verity  is  aiming  its  Topic 
software  not  at  occasional  users  of  the 
sort  who  might  call  up  Dialog  or  Dow 
Jones  News  Retrieval  when  they  visit  a 
library,  but  rather  at  corporations 
making  the  same  sorts  of  inquiries 
over  and  over.  Example:  a  pharma- 
ceutical company  tracking  adverse  re- 
actions to  its  products  through  several 
databases  connected  over  a  local  area 
network.  One  database  would  hold 
Federal  Drug  Administration  reports 
on  the  drugs,  another  internal  lab 
reports  and  another  communications 
from  physicians  noting  reactions. 

Topic  takes  advantage  of  the  repeti- 
tiveness  of  queries  by  picking  the 


user's  brain  for  information  about  the 
subjects  that  are  relevant  and  their 
relation  to  one  another.  An  expert 
familiar  with  the  subject  of  the  search 
assigns  weights  to  search  terms,  then 
composes  a  sample  query  on  adverse 
reactions  that  would  link  together 
terms  that  might  be  germane.  When- 
ever a  person  needed  to  research  the 
subject  of  adverse  reactions,  he  would 
run  the  all-purpose  query  in  conjunc- 
tion with  the  name  of  a  specific  drug. 

Verity's  customers  include  the 
White  House,  which  uses  it  to  route 
wire  service  reports  to  the  appropriate 
staffers;  Bankers  Trust,  which  uses  it 
to  study  competitors;  and  the  De- 
fense Department  agency  charged 
with  ensuring  that  sensitive  or  dan- 
gerous technology  isn't  shipped  to 


the  wrong  hands.  "We  have  a  large 
volume  of  data  that  needs  to  be  sifted 
through  to  track  those  people  who 
may  be  shipping  restricted  technol- 
ogy to  the  bad  guys,"  says  this  agen- 
cy's Colonel  Francis  Wilson.  "We  just 
can't  do  that  in  a  timely  fashion  by 
having  analysts  go  through  it.  With 
Topic,  we  can  filter  the  information 
based  on  parameters  that  set  the  im- 
portance of  certain  terms." 

Reid  explains:  "In  a  Boolean  sys- 
tem you  issue  a  query  and  in  essence 
segment  the  database  into  two  sets, 
those  documents  that  match  and 
those  that  don't.  Our  system  estab- 
lishes another  set,  the  set  that  broadly 
includes  everything  you  might  be  in- 
terested in,  with  the  system  determin- 
ing the  degree  to  which  the  document 
is  in  your  set." 

Like  other  retrieval  systems  being 
proposed  in  academia  or  developed 
by  commercial  firms,  Verity's  can 
rank  retrieved  documents  in  impor- 
tance. "Topic  is  like  telling  the  re- 
searcher to  give  you  everything  on  a 
subject  but  put  the  good  stuff  on 
top,"  says  Robert  Williams,  vice  presi- 
dent of  marketing  at  Verity. 

Topic  can  also  make  connections 
between  documents  that  fit  the  user's 
query  and  loosely  related  documents. 
Says  Williams:  "Often  the  most  inter- 
esting result  of  a  search  isn't  finding 
the  right  document  but  finding  12 
that  are  related  in  an  interesting  way, 
inspiring  a  point  of  view  that  you 
hadn't  thought  of  before." 

That's  getting  more  sophisticated, 
but  until  smarter  systems  such  as  Top- 
ic become  more  popular,  computers 
will  still  have  a  way  of  constantly 
reminding  the  user  that  they  are  just 
computers.  When  Reid  used  the  old- 
fashioned  technique  to  query  a  data- 
base for  articles  about  earthquakes,  he 
got  back  a  lot  of  irrelevant  baseball 
stories.  The  search  program  had  con- 
cluded from  the  many  mentions  of 
the  interrupted  World  Series  game  in 
stories  about  the  1988  San  Francisco 
quake  that  the  Oakland  A's  and  the 
San  Francisco  Giants  had  some  causal 
connection  to  the  subject. 

So  maybe  machines  can  never  func- 
tion as  good  researchers  without  out- 
side help  from  humans.  But,  given  the 
huge  volume  of  text  to  be  searched, 
neither  can  humans  function  without 
the  machines.  ■ 
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